Systems, apparatuses, and methods for deceptive infusion of data

ABSTRACT

Systems, apparatuses, and methods for deceptive infusion and obfuscation of data are disclosed. An apparatus including a communication terminal and a processing circuitry. The communication terminal is configured to transmit information to an artificial intelligence engine. The processing circuitry is configured to decompose raw data into fundamental metadata and inference metadata. The processing circuitry is also configured to generate one or more concealment operators and generate a deception kernel responsive to the inference metadata, the one or more concealment operators, and/or the fundamental metadata. The processing circuitry is configured to obfuscate the fundamental metadata responsive to the one or more concealment operators and the deception kernel, and provide the obfuscated fundamental metadata and the inference metadata to the artificial intelligence engine for processing.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit under 35 U.S.C. § 119(e) of U.S.Provisional Patent Application Ser. No. 63/227,389, filed Jul. 30, 2021,the disclosure of which is hereby incorporated herein in its entirety bythis reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This disclosure was made with government support under Contract NumberDE-AC07-05-1D14517 awarded by the United States Department of Energy.The government has certain rights in the disclosure.

TECHNICAL FIELD

Embodiments of the present disclosure relate generally to obfuscation ofraw data before artificial intelligence or machine learning processingof the data.

BACKGROUND

Owners of a proprietary system (e.g., scientific experiment or appliedtechnology) may want to publicly disseminate findable, accessible,interoperable, and reusable (FAIR) scientific data to artificialintelligence/machine learning (AI/ML) researchers for processing.However, the data may contain sensitive or classified information thatshould not be publicly shared. Furthermore, even if the data issanitized, the owners may be reluctant to share the information for fearthat confidential information may be reverse engineered or compromised.

BRIEF SUMMARY

Embodiments disclosed herein include methods, systems and/or apparatusesconfigured to obfuscate fundamental metadata for AI/ML processing. Someembodiments include an apparatus including a communication terminalconfigured to transmit information to an artificial intelligence engine.The apparatus may also include a processing circuitry configured todecompose raw data into fundamental metadata and inference metadata. Theprocessing circuitry may be configured to generate one or moreconcealment operators and a deception kernel responsive to the inferencemetadata and the one or more concealment operators. The processingcircuitry may be further configured to obfuscate the fundamentalmetadata responsive to the one or more concealment operators and thedeception kernel. The processing circuitry may then be configured toprovide the obfuscated fundamental metadata and the inference metadatato the artificial intelligence engine for processing.

Additional embodiments include a system including a deception engineconfigured to decompose raw data into fundamental metadata and inferencemetadata. The deception engine may be configured to generate one or moreconcealment operators and a deception kernel responsive to the inferencemetadata and the one or more concealment operators. The deception enginemay also be configured to obfuscate the fundamental metadata responsiveto the one or more concealment operators. The system may also include anartificial intelligence engine configured to receive data from thedeception engine, the data comprising the obfuscated fundamentalmetadata and the inference metadata, process the received data, andprovide the processed data to the deception engine.

Additional embodiments may be directed to a method including decomposingraw data into fundamental metadata and inference metadata. The methodmay further include generating one or more concealment operators and adeception kernel responsive to the inference metadata and the one ormore concealment operators. The method also includes obfuscating thefundamental metadata responsive to the one or more concealment operatorsand the deception kernel, and providing the obfuscated fundamentalmetadata and the inference metadata to an artificial intelligence enginefor processing.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart depicting a method for obfuscating fundamentalmetadata from benchmark raw data, according to some embodiments.

FIG. 2 is a block diagram depicting an apparatus for obfuscating andproviding inference metadata to an artificial intelligence engine,according to one or more embodiments of the present disclosure.

FIG. 3 is a block diagram depicting a system for obfuscating andprocessing inference metadata, in accordance with one or moreembodiments.

FIG. 4 is a flowchart depicting a method for operating a data processingnetwork, according to one or more embodiments of the present disclosure.

DETAILED DESCRIPTION

In the following detailed description, reference is made to theaccompanying drawings, which form a part hereof, and in which are shown,by way of illustration, specific examples of embodiments in which thepresent disclosure may be practiced. These embodiments are described insufficient detail to enable a person of ordinary skill in the art topractice the present disclosure. However, other embodiments enabledherein may be utilized, and structural, material, and process changesmay be made without departing from the scope of the disclosure.

The illustrations presented herein are not meant to be actual views ofany particular method, system, device, or structure, but are merelyidealized representations that are employed to describe the embodimentsof the present disclosure. In some instances similar structures orcomponents in the various drawings may retain the same or similarnumbering for the convenience of the reader; however, the similarity innumbering does not necessarily mean that the structures or componentsare identical in size, composition, configuration, or any otherproperty.

The following description may include examples to help enable one ofordinary skill in the art to practice the disclosed embodiments. The useof the terms “exemplary,” “by example,” and “for example,” means thatthe related description is explanatory, and though the scope of thedisclosure is intended to encompass the examples and legal equivalents,the use of such terms is not intended to limit the scope of anembodiment or this disclosure to the specified components, operations,features, functions, or the like.

It will be readily understood that the components of the embodiments asgenerally described herein and illustrated in the drawings could bearranged and designed in a wide variety of different configurations.Thus, the following description of various embodiments is not intendedto limit the scope of the present disclosure, but is merelyrepresentative of various embodiments. While the various aspects of theembodiments may be presented in the drawings, the drawings are notnecessarily drawn to scale unless specifically indicated.

Furthermore, specific implementations shown and described are onlyexamples and should not be construed as the only way to implement thepresent disclosure unless specified otherwise herein. Elements,circuits, and functions may be shown in block diagram form in order notto obscure the present disclosure in unnecessary detail. Conversely,specific implementations shown and described are exemplary only andshould not be construed as the only way to implement the presentdisclosure unless specified otherwise herein. Additionally, blockdefinitions and partitioning of logic between various blocks isexemplary of a specific implementation. It will be readily apparent toone of ordinary skill in the art that the present disclosure may bepracticed by numerous other partitioning solutions. For the most part,details concerning timing considerations and the like have been omittedwhere such details are not necessary to obtain a complete understandingof the present disclosure and are within the abilities of persons ofordinary skill in the relevant art.

Those of ordinary skill in the art will understand that information andsignals may be represented using any of a variety of differenttechnologies and techniques. Some drawings may illustrate signals as asingle signal for clarity of presentation and description. It will beunderstood by a person of ordinary skill in the art that the signal mayrepresent a bus of signals, wherein the bus may have a variety of bitwidths and the present disclosure may be implemented on any number ofdata signals including a single data signal.

The various illustrative logical blocks, modules, and circuits describedin connection with the embodiments disclosed herein may be implementedor performed with a general purpose processor, a special purposeprocessor, a digital signal processor (DSP), an Integrated Circuit (IC),an Application Specific Integrated Circuit (ASIC), a Field ProgrammableGate Array (FPGA) or other programmable logic device, discrete gate ortransistor logic, discrete hardware components, or any combinationthereof designed to perform the functions described herein. Ageneral-purpose processor (may also be referred to herein as a hostprocessor or simply a host) may be a microprocessor, but in thealternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices, such as a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration. A general-purpose computer including a processor isconsidered a special-purpose computer while the general-purpose computeris configured to execute computing instructions (e.g., software code)related to embodiments of the present disclosure.

The embodiments may be described in terms of a process that is depictedas a flowchart, a flow diagram, a structure diagram, or a block diagram.Although a flowchart may describe operational acts as a sequentialprocess, many of these acts may be performed in another sequence, inparallel, or substantially concurrently. In addition, the order of theacts may be re-arranged. A process may correspond to a method, a thread,a function, a procedure, a subroutine, a subprogram, other structure, orcombinations thereof. Furthermore, the methods disclosed herein may beimplemented in hardware, software, or both. If implemented in software,the functions may be stored or transmitted as one or more instructionsor code on computer-readable media. Computer-readable media includesboth computer storage media and communication media including any mediumthat facilitates transfer of a computer program from one place toanother.

Any reference to an element herein using a designation such as “first,”“second,” and so forth does not limit the quantity or order of thoseelements, unless such limitation is explicitly stated. Rather, thesedesignations may be used herein as a convenient method of distinguishingbetween two or more elements or instances of an element. Thus, areference to first and second elements does not mean that only twoelements may be employed there or that the first element must precedethe second element in some manner. In addition, unless stated otherwise,a set of elements may include one or more elements.

As used herein, the term “substantially” in reference to a givenparameter, property, or condition means and includes to a degree thatone of ordinary skill in the art would understand that the givenparameter, property, or condition is met with a small degree ofvariance, such as, for example, within acceptable manufacturingtolerances. By way of example, depending on the particular parameter,property, or condition that is substantially met, the parameter,property, or condition may be at least 90% met, at least 95% met, oreven at least 99% met.

It may be useful for analysts of a proprietary system (e.g., scientificexperiment or applied technology) to extract findable, accessible,interoperable, and reusable (i.e., “FAIR”) scientific data for publicdissemination and provide the data to artificial intelligence/machinelearning (AI/ML) researchers in a manner that may not bereverse-engineered and/or may prevent disclosure of classified orproprietary information. Despite the benefits of designing publiclyavailable and benchmarkable scientific datasets to optimally realize thevalue of AI/ML techniques, it may not be safe to release data, includingsanitized data, about proprietary systems due to possible undesirablerepercussions against the systems and/or system administrators,including losing support from the public or investors.

It may be desirable to develop, within a physical system (i.e., a systemthat behaves according to some governing laws), an AI/ML benchmark datapreparation system capable of concealing the identity of the associatedsystem while retaining all the functional dependencies desired to allowresearchers to process the data and optimize AI/ML techniques. Differingfrom conventional privacy preserving, data masking, and anonymizationtechniques, the present disclosure relates to an efficient andtime-space scalable methodology based on a non-invertible deceptiveinfusion of data (DIOD) methodology. The methodology may be designed topreserve correlations among benchmark datasets, representing the targetinformation harvested by AI/ML techniques, while obfuscating afundamental structure of the system's confidential underlying governinglaws. This may be achieved in a non-invertible manner that may not beself-learned from the benchmark datasets. The present disclosure,including the disclosed DIOD methodology may represent a paradigm shiftto data anonymity methods, such as k-anonymity, l-diversity,t-closeness, and m-invariance. Such data anonymity methods may focus onlimiting access to structured datasets, such as, for example, personallyidentifiable information and health records. For protection of data, theconventional methods may either use non-discriminative loss ofinformation or introduce ambiguity through means such as perturbation,encryption, or suppression that changes the inference characteristics ofdata and may impact the application of AI/ML techniques. Theconventional methods provide limited protection because they do notimpact the underlying governing structure. The obfuscation of dataprovenance may be compromised if enough data is provided through singleor multiple data dissemination events. In contrast, various embodimentsof the present disclosure focus on overcoming this limitation byobfuscating the underlying governing laws, without impacting theapplication of AI/ML methods. Various embodiments of the presentdisclosure expand the scope of data protection to scientific types ofdata, such as time-series data from critical experiments.

The DIOD methodology according to various embodiments of the presentdisclosure may use a non-unique AI/ML-invariant mathematicaltransformation for the disseminated data. The “invariance” propertyindicates that performance of AI/ML is not impacted by a transformation,whereas the “non-uniqueness” property means that any inference/inverseanalysis attempting to estimate the transformation may not besuccessful. The present disclosure may include two or more operationsincluding a decomposition operation and a fusion operation. In thedecomposition operation, a randomized range finding algorithm may beused to decompose the benchmark multi-variate data into two sets ofindependent metadata. The first set is referred to as “fundamentalmetadata,” which describes the underlying governing laws, oftenrepresented by a combination of physics principles and constraints. Assuch, the fundamental metadata may be tied to an identity of the systemthat generated the benchmark datasets. The second set is denoted“inference metadata” and is used to train the AI/ML system. In thefusion operation, a mathematical kernel, employing a library ofpre-calculated concealment operators, may be used to fuse thesystem-specific inference metadata with another set of fundamentalmetadata that are representative of a different system. The fusionoperation allows preservation of the inference metadata, as is helpfulfor AI/ML training, while concealing the identity of the associatedsystem to different levels of privacy, as is helpful for the owner ofthe data.

While the DIOD methodology according to various embodiments of thepresent disclosure is designed to protect the source data, its enablingalgorithms may be based on reduced order modeling (ROM) techniques,which are adopted to reduce computational and storage requirements.These ROM techniques may enable efficient exchange of benchmark datasetsand offer online application of the DIOD methodology at the dataacquisition level. Further, this DIOD methodology may also enhance AI/MLresearch by emphasizing the concept of invariance in AI/ML learning,which is similar to the concept of physics law invariance. AI/MLlearning may not be sub-optimally impacted by custom system knowledge,which is often embedded to constrain and guide the AI/ML learning.

Possible benefits of proprietary data dissemination to researchers mayspan multiple industries and topics. One non-limiting example of anapplication of various embodiments of the present disclosure may be inmaterials discovery, design, development, and deployment. Emergingmaterials may have a novel structure, designed with multi-functionalproperties to optimize performance for energy generation and storage,while simultaneously mitigating environmental impact. These stages inthe material lifecycle have conventionally been treated largely asindependent or weakly coupled. To effectively use AI/ML, data from allstages should be used, including high-fidelity modeling and simulationresults, as well as process and performance parameters frommanufacturing. Providing such detailed data for AI/ML processing maylead to reverse-engineering and identification of data provenance, whichis a security/privacy concern for proprietary manufacturing data. TheDIOD methodology according to various embodiments of the presentdisclosure may reduce or overcome this challenge and allow for dataacross all stages to be leveraged while masking the data provenance to auser-defined level of concealment.

In addition to becoming a new standard for FAIR-communication ofproprietary scientific data to support advanced AI/ML learning, thevarious disclosed embodiments of DIOD methodology may open several newtheoretical frontiers for the use of AI/ML techniques for the analysisof unique scientific data, whether experimentally-measured or simulatedusing high-fidelity modeling tools.

Research on the development and adoption of AI/ML techniques to improvethe exploration and analysis of scientific systems has grown, inresponse to continued growth, complexity, and interconnectedness ofsystems, large volumes of collected data, and growth in computerprocessing power.

AI/ML techniques may achieve goals of improved data analysis viacomputationally efficient signature identification algorithms. Thesealgorithms are capable of sifting through large volumes of data, i.e.,recorded measurements and/or simulation data and extracting informationthat are relevant for AI/ML inference. These signatures are mathematicalclassifiers capable of differentiating between different system statesand becoming state aware. For example, AI/ML techniques may acceleratematerials discovery, design, development, and deployment, regardless ofthe application, as long as they are suited with a proper set of data.

The performance of AI/ML techniques should be carefully examined withtechniques similar to model validation practices adopted in variousfields. A key operation in any validation practice is the development ofa benchmark model, which provides a common ground for researchers totest and compare methods. A conventional benchmark approach includes aclear description of the system and the subsystems layout, acomprehensive description of each subsystem model and associated designdetails, and a clear description of measurements and theiruncertainties. While the adoption of this straightforward approach totest AI/ML techniques may be acceptable for a number of scenarios (e.g.,design of new synthetic materials, imaging applications, feasibility ofnew simulation tools, etc.), this approach is unlikely to be adopted forhigh-valued classified or proprietary systems. A benchmark model for ahigh-valued system may allow for better system configurations thatimprove system function and resilience to various sources ofuncertainties, representing the overarching goal of the application ofAI/ML techniques. On the other hand, such benchmarks could be used toreverse engineer system information and functionality. This couldpotentially lead to identification of proprietary information and resultin undesirable repercussions. Such undesirable identification may occureven if the benchmark model excludes key details from the releaseddatasets. This is because AI/ML techniques are designed to identifypatterns and association rules, and performance is greatly improved withknowledge of the governing laws, i.e., the “fundamental metadata”underlying system behavior as defined herein.

Therefore, proprietary and critical system details may be inferred fromthe released datasets, thus enabling malicious actors to gain access toprotected information and an understanding of confidential functions.These concerns may discourage the owners of such systems from publiclydisclosing any benchmark datasets even if heavily sanitized. Analternative method may be to hire private AI/ML resources, which maylimit the benefits of the AI/ML validation procedure as compared to anapproach that engages a wider research community. Thus, it may behelpful to develop AI/ML public benchmarks that may not be traced backto their associated systems. Depending on the level of identity maskingdesired, different levels of obfuscation strategies may be developed tohide various aspects of the associated original data identity. The DIODmethodology of the present disclosure is designed specifically to ensureoptimal performance of AI/ML techniques for high-valued systems withmasked data provenance.

Various embodiments of the disclosed DIOD methodology may includemathematical transformation of system benchmark datasets into a formthat meets the following two-part criteria: (a) the benchmark data andtheir correlations may not be inverted, i.e., they may not be relatedback to the original system that generated the data; and (b) down to apreset tolerance, correlations (e.g., all of the correlations) in theoriginal benchmark data may be rediscovered by AI/ML techniques in theDIOD benchmark data. These criteria allow researchers to devisestrategies to test/validate AI/ML techniques without revealing the trueidentity of the system. The mathematical tools to enable suchtransformation rely on ideas from support vector machines, where insteadof using simplified functions (e.g., radial basis functions, sigmoidfunctions, etc.), a template of pre-calculated known concealmentoperators, which is different from but may resemble the system'sfundamental metadata, are employed to obfuscate a representation of theinference metadata extracted from the original system, i.e., decouplethe inference metadata from their original fundamental metadatarepresentation to a new representation. This may be done usingrank-deficient transformations such that the deception process remainsnon-invertible, without impacting the inference metadata. In doing so,the AI/ML-relevant correlations existing between the raw (unmasked) dataare preserved. Many of the decisions involved in the fusion operation,e.g., choice of the concealment operators, the kernel transformation,etc., are randomized using one-way hash functions such that the same raw(e.g., unmasked) benchmark data may be fused using different fundamentaldata corresponding to different systems. Thus, two groups of researchersmay be given two different DIOD renditions of the same raw benchmarkdata.

Various embodiments of the disclosed DIOD methodology provide a generictheoretical framework for the testing of AI/ML techniques, reducingreliance on trial and error methods and customized AI/ML learningmethods. By way of non-limiting example, in the healthcare field,knowledge extraction using data mining may be useful in classifyingdiseases and guiding physicians to optimize treatment strategies.Knowledge extraction may be achieved via a mathematical function, calleda “classifier,” that is determined via an optimization-based trainingprocedure against available data. Given that the training data containprivate information such as patients' names, social security numbers,addresses, etc., which might be compromised due to attacks on enterprisecomputing systems, data anonymity (e.g., data obfuscation, datasubstitution, data masking, etc.) techniques may be useful to protectthe private information. An objective of these techniques is toconstruct data mining models without violating the privacy of the dataowners (i.e., the data mining model assumes no inference from theprivate data). In some instances, classifier accuracy is assumed to beinsensitive to the private data. Conventional obfuscation techniques mayinclude data alteration, perturbation, encryption, or suppression. Theseapproaches may be effective when the private data, e.g., a patient'sname or social security number, do not have an impact on the classifierfunction. In other situations, private data, such as sex or age, couldhave an impact on the classifier function. For such scenarios,encryption techniques may be employed, which may require decryptionbefore application of the data mining model.

A goal of conventional privacy-preserving techniques may be to denyaccess to private data. In some instances it may be desirable to ensureaccess to relevant correlations (e.g., all the relevant correlations)and association rules inherent in datasets. This may be helpful toincrease the potential of AI/ML techniques in identifying hidden and newpatterns that may be leveraged to develop better insight and ultimatelyimprove the associated system function and its resilience touncertainties. Limiting access to some of the data may have adeteriorating impact on the quality of the AI/ML models, thus defeatingthe purpose of their application. As such, any alteration of the datashould be done to reduce impact on the application of AI/ML techniques.

In conventional privacy-preserving data mining research, an impact ofprivacy on quality of data mining models may sometimes be characterizedin terms of the model's “utility.” A high utility generally implies lowprivacy and vice versa. Conventional privacy-preserving methods likedata perturbations and randomization treat privacy and utility as a pairof conflicting constraints, thus leading to an optimization approach tofind, in a sense, an optimal trade-off solution that maximizes privacywhile maintaining an acceptable level of utility. This approach is notadequate for some applications because all data relevant to AI/MLapplication should be made available. Removal of any data, e.g., certainprocess variables or data collected at certain conditions, may reducethe inference information available for AI/ML techniques leading tosub-optimal learning results. Various DIOD methodologies disclosedherein address this challenge by separating out, via a mathematicaltransformation, the benchmark datasets into two independent metadatasets—inference metadata, which is responsible for informing the AI/MLtechniques, and fundamental metadata, which is tied to data provenance.Mathematically, AI/ML techniques are invariant to such decomposition.

Various DIOD methodology embodiments disclosed herein relate to the lawof invariance observed in mathematical and theoretical physics, which iswhere an observed pattern is considered a governing law if it does notchange due to some mathematical transformation.

FIG. 1 is a flowchart depicting a method 100 for obfuscating fundamentalmetadata from benchmark raw data (e.g., a “DIOD methodology”), accordingto some embodiments.

In various embodiments disclosed herein, a DIOD methodology is relatedto the law of invariance observed in mathematical and theoreticalphysics. This law means that an observed pattern is considered agoverning law if it does not change due to some mathematicaltransformation. In observing a system to discover its underlyinginference metadata, the learned patterns should not be impacted by anyprocessing of the observed data. AI/ML techniques should be rendered inan invariant manner. This has been realized in many areas, including, asnon-limiting examples, handwritten text recognition, facial and objectrecognition, imaging, etc. Notable methods that may be renderedinvariant include kNN (k Nearest Neighbors), kernelized methods, supportvector machines, etc. Transformations may exist that do not impactperformance of AI/ML techniques due to their invariance property, wherethe transformations are customized to meet user-defined concealmentrequirements.

Method 100 may include obtaining benchmark raw data 102. Benchmark rawdata 102 may include dependencies such as time and external dependencies104 and underlying fundamental governing laws 106. The benchmark rawdata 102 may be decomposed into fundamental metadata 110 (e.g., metadatarepresenting the underlying fundamental governing laws 106) andinference metadata 112 (e.g., metadata representing anything that is notfundamental, including, for example, time and external dependencies104). In some embodiments, the decomposition may be performed via areduced order modeling (ROM) analysis technique 108. ROM analysis 108may be an efficient mathematical construct capable of capturing thedominant features of system dynamics related to the benchmark raw data102. In some embodiments, the ROM analysis 108 may describe systemvariables y variations using a decomposable expression of the form:

y(x,α)≅Σ_(i=1)ω_(i)(α)φ_(i)(x)

where x denotes position in the phase space (e.g., space and time) and adenotes a set of control variables that specify the conditions underwhich the system is being observed, e.g., experimental conditions,forcing functions, boundary and initial conditions, etc. The r functionsform a basis for an active subspace, which approximates possiblevariations for the system variables within a user-defined tolerance suchthat:

|y(x,α)−Σ_(i=1) ^(r)ω_(i)(α)φ_(i)(x)|<ϵ

Active subspace functions may be captured (e.g., through some form ofoptimization) using randomized range finding algorithms. These functionsare related to the underlying laws governing system behavior and itsdesign specifications, including geometry details, compositions, andsystem proprietary information. The weights ω_(i)(α) may be influencedby control variables. Patterns in the system variables may be split intotwo sets of metadata. The first metadata (e.g., fundamental metadata110, represented by ϕ) are determined by the underlying behavioroperator and system configurations. By way of non-limiting example,fundamental metadata 110 may include proprietary information such as thegeometry of the system, material composition, underlying differentialequations, etc., that may disclose the identity of the system. Thesecond metadata (e.g., inference metadata 112, represented by co) aredetermined by the system operational conditions. By way of non-limitingexample, inference metadata 112 may include information about theoperational history of the system such as the temperature, mass flowrate, and other control parameters that are relevant to the AI/MLapplications for optimization or inference purposes. The fundamentalmetadata 110 may be provided as input to the AI/ML techniques to helpguide the optimal identification of the inference metadata 112. Metadatamay be data that are derived from the raw data in support of identifyingdata provenance and enabling AI/ML-based inference.

The method 100 may include generating a library of generic concealmentoperators, as shown in operation 114. This operation 114 may beperformed one time and may be focused on preparation of a pre-calculatedlibrary of operators that may be used to conceal the identity of theoriginal system. This library may be generated seamlessly by applying asimilar ROM decomposition as described above to multiple standard testproblem data from a variety of disciplines. The library may be developedto achieve two or more different levels of concealment. A first levelmay focus on building a benchmark model that represents a given class ofsystems (e.g., class of materials with desired general properties for agiven application). The true identity of the concealed system within theclass is to remain unidentifiable. A second masking level may bedesigned for adversarial scenarios where the true identity of the systemis concealed and its class remains unidentifiable. The first maskinglevel may be valuable to a wide range of science fields (e.g.,materials, fusion, energy cells, etc.) interested in developing genericAI/ML techniques to improve performance across a scientific field, to bedone in a manner that does not reveal the original data provenance. Thesecond masking level may target highly critical or classifiedapplications. The concealment operators generated in operation 114 mayhave some resemblance to the system benchmark raw data 102, or may beselected to be completely independent.

The method 100 may combine (e.g., fuse) the decomposed metadata (e.g.,fundamental metadata 110 and/or inference metadata 112) and thegenerated concealment operators using a deception kernel 116 to generateDIOD benchmark data 122, which hides the identity of the system and itsassociated governing laws. The DIOD benchmark data 122 may includeinference metadata 118 and obfuscated fundamental metadata 120. In oneor more embodiments, the inference metadata 118 is identical to theinference metadata 112, or may be functionally equivalent to inferencemetadata 112. In some embodiments, the deception kernel 116 may berepresented by the following:

k(x,x′)=Σ_(i=1) ^(r)ψ_(i)(x′)φ*_(i)(x).

The superscripted * denotes inner product operation when the kernel isapplied to a given function. This kernel is designed to take a givenrealization of the variables y(x,a) and generate its DIOD version,denoted by the primed variables, such that

y′(x′,α)≅∫k(x′,x)y(x,α)dx,

where the functions ψ_(i)(x) represent a member of the concealmentoperators generated at operation 114. The deception kernel is atransformation that effectively “overwrites” the fundamental metadata ofthe proprietary system with that of the generic system, thus masking theproprietary information. As explained earlier, DIOD benchmarks maycompletely obfuscate the identity of the system or may retain somefeatures about the system or the data provenance depending on theultimate goal of the benchmark application. When no obfuscation isdesired, the following functions are set to be equal to each other:ψ_(i)(x)=ϕ_(i)(x). If the functions ψ_(i)(x′) are selected to beorthonormal, the DIOD data may be decomposed into:

y′(x′,α)≅Σ_(i=1) ^(r)ω_(i)(α)ψ_(i)(x′).

This equation means that an AI/ML application to the original and DIODdata would yield the same inference metadata ω_(i)(α) (e.g., inferencemetadata 112). With invariant AI/ML techniques, additional rotation andscaling type transformation P may be introduced to further obfuscate theinference data as follows:

y′(x′,α)≅Σ_(i=1) ^(r)[Pω _(i)(α)]ψ_(i)(x′).

The DIOD kernel allows the space x′ to be generally different from theoriginal benchmark space x. Depending on the sought level ofconcealment, the two spaces may be the same.

In one or more embodiments, an optional operation of method 100 isverification test(s) 124, which tests the performance of the DIODbenchmark data 122. Verification tests may be designed usingrepresentative datasets from a scientific testing experiment. Theselected datasets may be used to determine whether the AI/ML-basedclassifiers have the same classification accuracy, as calculated interms of the false positives and the decision boundaries for theclassifiers against the projected performance using the benchmark raw(i.e., unmasked) data.

The verification tests 124 may be designed to represent a broad spectrumof systems behavior and incorporate various cyclic and spares patterns.A standard class of AI/ML-based classifiers may be selected fordesigning the test cases, e.g., support vector machines, kNN, neuralnetworks, Lasso regression, long short-term memory neural networks fortime-based regression, principal component analysis and a kernelizedversion. Distance measure (e.g., Euclidean, Manhattan, Chebyshev, andMahalanobis, etc.) may be used for testing the performance of the rawbenchmark data.

The robustness of the fundamentals-decomposition methodology from theROM analysis 108 may be tested against various sources of uncertaintiesin the DIOD benchmark data. AI/ML techniques may be subject toinstability (e.g., due to noise in the training data) that may lead tounpredictable and erroneous classification results. Thus, robustnesstests may be designed for insensitivity of the decomposition algorithmto the various sources of uncertainties resulting from the raw benchmarkdata. Invariance of the distance measures to the DIOD transformation maybe assessed, since they represent the basis for the majority ofAI/ML-based classifiers. Mathematically, the deception kernel 116 K maybe designed such that, for a given classifier f, the following identityholds:

f _(Ky)(Kz)=f _(y)(z),

where y is the original benchmark data used to train a classifier f, andf_(y)(z) are the results of the classification as applied to test dataz. This equation implies that the same classifier trained on thetransformed data, Ky, should give the same classification accuracy whenapplied to test transformed test data z. The disclosed deception kernel116 is designed to satisfy this criterion. This is because, for example,using a kNN classifier, the class of a point is determined by the classof the nearby points. If the distance measure employed is invariant totransformation, the same classification may be rendered. The sameconclusion applies to other kernelized methods, including support vectormachines, which rely on distance measures and may be invariant totransformation.

FIG. 2 is a block diagram depicting an apparatus 202 for obfuscating andproviding inference metadata to an artificial intelligence engine 206according to one or more embodiments of the present disclosure. Theapparatus 202 may include a processing circuitry 208 and a communicationterminal 204. Several functions of the processing circuitry 208 arediscussed in more detail with reference to FIG. 1 . For example, theprocessing circuitry 208 may perform the ROM analysis 108, perform theoperation 114 of generating the library of concealment operators, andmay fuse the concealment operators, the fundamental metadata 110, andthe inference metadata 112 using the deception kernel 116.

The processing circuitry 208 may decompose raw data into fundamentalmetadata and inference metadata. In some embodiments, the processingcircuitry 208 may decompose the raw data by passing the raw data througha reduced order model. The processing circuitry 208 may generate one ormore concealment operators. The processing circuitry 208 may generate adeception kernel responsive to the inference metadata and the one ormore concealment operators generated by the processing circuitry 208. Inone or more embodiments, the processing circuitry 208 may generate thedeception kernel responsive to the inference metadata, the one or moreconcealment operators, and the fundamental metadata. The processingcircuitry may obfuscate (i.e., mask, conceal, hide, blind, etc.) thefundamental metadata responsive to the one or more concealment operatorsand the deception kernel. In some embodiments, the deception kernel isconfigured to replace the fundamental metadata with the generatedconcealment operators. In one or more embodiments, the fundamentalmetadata is fused with the concealment operators to obfuscate thefundamental metadata responsive to the deception kernel. The processingcircuitry may provide the obfuscated fundamental metadata and theinference metadata to the artificial intelligence engine 206 via thecommunication terminal 204. The communication terminal 204 may transmitinformation to any outside destination, including public destinationsfor data sharing.

The artificial intelligence engine 206 may process the obfuscatedfundamental metadata to extract trends, determine patterns, predictfuture outcomes, etc. In some embodiments, results and solutionsdetermined from the processed information may be transmitted to theapparatus 202. In such cases, the communication terminal 204 may beconfigured to receive the information and provide the results,solutions, and/or processed information to the processing circuitry 208.

FIG. 3 is a block diagram depicting a system 302 for obfuscating andprocessing inference metadata, in accordance with one or moreembodiments. The system 302 may include a deception engine 304 and anartificial intelligence engine 306. The deception engine 304 may be anexample of an apparatus 202 of FIG. 2 . The artificial intelligenceengine 306 may be an example of an artificial intelligence engine 206 ofFIG. 2 .

The deception engine 304 may decompose raw data into fundamentalmetadata and inference metadata. The deception engine 304 may generateone or more concealment operators. The deception engine 304 may furthergenerate a deception kernel responsive to the inference metadata and theone or more concealment operators. The deception engine may obfuscatethe fundamental metadata responsive to the one or more concealmentoperators and the deception kernel, and may provide the obfuscatedmetadata to the artificial intelligence engine 306.

The artificial intelligence engine 306 may receive the obfuscatedfundamental metadata and the inference metadata from the deceptionengine 304 for processing. The artificial intelligence engine 306 mayprocess the received data and provide the resulting solutions and/orprocessed data to the deception engine.

FIG. 4 is a flowchart depicting a method 400 for operating a dataprocessing network, according to one or more embodiments of the presentdisclosure. In operation 402, method 400 decomposes raw data intofundamental metadata (e.g., fundamental metadata 110 of FIG. 1 ) andinference metadata (inference metadata 112 of FIG. 1 ). ROM analysis 108of FIG. 1 may be an example of operation 402. In operation 404, method400 generates one or more concealment operators. Operation 114 of FIG. 1may be an example of operation 404. In one or more embodiments, method400 may generate one or more concealment operators by decomposing asecond set of raw data. In some cases, the second set of raw data may berelated to the raw data and in other cases, the second set of raw datamay be unrelated to the raw data. In operation 406, method 400 generatesa deception kernel responsive to the inference metadata and the one ormore concealment operators. The deception kernel 116 of FIG. 1 may be anexample of the deception kernel that is generated in operation 406. Inoperation 408, method 400 obfuscates the fundamental metadata responsiveto the one or more concealment operators and the deception kernel. Inoperation 410, method 400 provides the obfuscated fundamental metadataand the inference metadata to an artificial intelligence engine forprocessing. In optional operation 412, method 400 may verify theobfuscated fundamental metadata by comparing a performance of theobfuscated fundamental metadata with a performance of the raw data.

As used in the present disclosure, the terms “module” or “component” mayrefer to specific hardware implementations configured to perform theactions of the module or component and/or software objects or softwareroutines that may be stored on and/or executed by general purposehardware (e.g., computer-readable media, processing devices, withoutlimitation) of the computing system. In some embodiments, the differentcomponents, modules, engines, and services described in the presentdisclosure may be implemented as objects or processes that execute onthe computing system (e.g., as separate threads). While some of thedevices, systems, and methods described in the present disclosure aregenerally described as being implemented in software (stored on and/orexecuted by general purpose hardware), specific hardware implementationsor a combination of software and specific hardware implementations arealso possible and contemplated.

As used in the present disclosure, the term “combination” with referenceto a plurality of elements may include a combination of all the elementsor any of various different sub-combinations of some of the elements.For example, the phrase “A, B, C, D, or combinations thereof” may referto any one of A, B, C, or D; the combination of each of A, B, C, and D;and any sub-combination of A, B, C, or D such as A, B, and C; A, B, andD; A, C, and D; B, C, and D; A and B; A and C; A and D; B and C; B andD; or C and D.

Terms used in the present disclosure and especially in the appendedclaims (e.g., bodies of the appended claims) are generally intended as“open” terms (e.g., the term “including” should be interpreted as“including, but not limited to,” the term “having” should be interpretedas “having at least,” the term “includes” should be interpreted as“includes, but is not limited to,” etc.).

Additionally, if a specific number of an introduced claim recitation isintended, such an intent will be explicitly recited in the claim, and inthe absence of such recitation no such intent is present. For example,as an aid to understanding, the following appended claims may containusage of the introductory phrases “at least one” and “one or more” tointroduce claim recitations. However, the use of such phrases should notbe construed to imply that the introduction of a claim recitation by theindefinite articles “a” or “an” limits any particular claim containingsuch introduced claim recitation to some embodiments containing only onesuch recitation, even when the same claim includes the introductoryphrases “one or more” or “at least one” and indefinite articles such as“a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “atleast one” or “one or more”); the same holds true for the use ofdefinite articles used to introduce claim recitations.

In addition, even if a specific number of an introduced claim recitationis explicitly recited, those skilled in the art will recognize that suchrecitation should be interpreted to mean at least the recited number(e.g., the bare recitation of “two recitations,” without othermodifiers, means at least two recitations, or two or more recitations).Furthermore, in those instances where a convention analogous to “atleast one of A, B, and C, etc.” or “one or more of A, B, and C, etc.” isused, in general such a construction is intended to include A alone, Balone, C alone, A and B together, A and C together, B and C together, orA, B, and C together, etc.

Further, any disjunctive word or phrase presenting two or morealternative terms, whether in the description, claims, or drawings,should be understood to contemplate the possibilities of including oneof the terms, either of the terms, or both terms. For example, thephrase “A or B” should be understood to include the possibilities of “A”or “B” or “A and B.”

While the present disclosure has been described herein with respect tocertain illustrated some embodiments, those of ordinary skill in the artwill recognize and appreciate that the present disclosure is not solimited. Rather, many additions, deletions, and modifications to theillustrated and described some embodiments may be made without departingfrom the scope of the disclosure as hereinafter claimed along with theirlegal equivalents. In addition, features from one some embodiment may becombined with features of another some embodiment while still beingencompassed within the scope of the disclosure as contemplated by theapplicant.

What is claimed is:
 1. An apparatus, comprising: a communicationterminal configured to transmit information to an artificialintelligence engine; and a processing circuitry configured to: decomposeraw data into fundamental metadata and inference metadata; generate oneor more concealment operators; generate a deception kernel responsive tothe inference metadata and the one or more concealment operators;obfuscate the fundamental metadata responsive to the one or moreconcealment operators and the deception kernel; and provide theobfuscated fundamental metadata and the inference metadata to theartificial intelligence engine for processing.
 2. The apparatus of claim1, wherein the processing circuitry is configured to generate thedeception kernel responsive to the inference metadata, the one or moreconcealment operators, and the fundamental metadata.
 3. The apparatus ofclaim 2, wherein the processing circuitry is configured to obfuscate thefundamental metadata by fusing the concealment operators and thefundamental metadata together.
 4. The apparatus of claim 1, wherein theprocessing circuitry is configured to obfuscate the fundamental metadataby replacing the fundamental metadata with the concealment operators. 5.The apparatus of claim 1, wherein the processing circuitry is configuredto decompose the raw data into the fundamental metadata and theinference metadata by passing the raw data through a reduced ordermodel.
 6. The apparatus of claim 1, wherein the communication terminalis configured to receive information from the artificial intelligenceengine.
 7. The apparatus of claim 1, wherein the processing circuitry isconfigured to generate one or more concealment operators by decomposinga second set of raw data.
 8. The apparatus of claim 1, wherein theprocessing circuitry is configured to generate at least two sets ofconcealment operators, where a first set of concealment operators has afirst security level and a second set of concealment operators has asecond security level.
 9. The apparatus of claim 1, wherein theprocessing circuitry is configured to: generate the one or moreconcealment operators responsive to first one-way hash functions; andgenerate the deception kernel responsive to second one-way hashfunctions.
 10. The apparatus of claim 1, wherein the fundamentalmetadata represents underlying governing laws related to a systemdescribed by the raw data.
 11. The apparatus of claim 1, wherein theinference metadata represents data used to train the artificialintelligence engine.
 12. A system, comprising: a deception engineconfigured to: decompose raw data into fundamental metadata andinference metadata; generate one or more concealment operators; generatea deception kernel responsive to the inference metadata and the one ormore concealment operators; and obfuscate the fundamental metadataresponsive to the one or more concealment operators and the deceptionkernel; and an artificial intelligence engine configured to: receivedata from the deception engine, the data comprising the obfuscatedfundamental metadata and the inference metadata; process the receiveddata; and provide the processed received data or an artificialintelligence method responsive to the processed received data to thedeception engine.
 13. The system of claim 12, wherein the deceptionengine is configured to provide the obfuscated fundamental metadata andthe inference metadata to the artificial intelligence engine.
 14. Thesystem of claim 12, wherein the deception engine is configured togenerate the deception kernel responsive to the inference metadata, theone or more concealment operators, and the fundamental metadata.
 15. Thesystem of claim 12, wherein the artificial intelligence engine isconfigured to process the inference metadata of the received data. 16.The system of claim 12, wherein the deception engine is configured tocompare a performance of the raw data to a performance of the processeddata received from the artificial intelligence engine.
 17. A method,comprising: decomposing raw data into fundamental metadata and inferencemetadata; generating one or more concealment operators; generating adeception kernel responsive to the inference metadata and the one ormore concealment operators; obfuscating the fundamental metadataresponsive to the one or more concealment operators and the deceptionkernel; and providing the obfuscated fundamental metadata and theinference metadata to an artificial intelligence engine for processing.18. The method of claim 17, wherein decomposing the raw data into thefundamental metadata and the inference metadata comprises passing theraw data through a reduced order model.
 19. The method of claim 17,further comprising verifying the obfuscated fundamental metadata bycomparing a performance of the obfuscated fundamental metadata with aperformance of the raw data.
 20. The method of claim 17, whereingenerating the one or more concealment operators comprises decomposing asecond set of raw data.